Lexical Knowledge Acquisition from Corpora

نویسندگان

  • Takehito Utsuro
  • Yuji Matsumoto
چکیده

The paper presents a computational environment to support developing a lexicon for natural language processing. The underlying idea of the environment is to utilize up-to-date language technologies to minimize both the human labor and the inconsistency that are unavoidable in manual compilation of a lexicon. The proposed computational environment enables an efcient construction of a consistent and fertile lexicon. Among the major components of the environment, this paper focuses on compilation (or acquisition) of subcategorization frame lexicon from parsed corpora. Especially, this paper discusses issues on semi-automatic sense classi cation of polysemous verbs and probabilistic model learning of subcategorization preference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of Lexical Semantics to Knowledge Acquisition from Corpora

In this paper, we describe a program of research designed to explore.' how a lexical semantic theory may be exploited for extracting information from corpora suitable for use in Information Retrieval applications. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the ~ultomatic construction of predictions about semantic relationships among words ap...

متن کامل

Combining NLP and statistical techniques for lexical acquisition

The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However the methods by which lexical knowledge should be extracted from plain texts are still matter of debate and experimentation. In this paper it is presented an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines ...

متن کامل

Corpus-Based Induction of Lexical Representation and Meaning

The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selec...

متن کامل

Lexical Database for Multiple Languages: Multilingual Word Semantic Network

Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators an...

متن کامل

Automatic lexical acquisition from corpora: some limitations and tentative solutions

This paper deals with lexical acquisition. We take another look at some experiments we have recently carried out on the automatic acquisition of lexical resources from French corpora. We describe the architecture of our system for lexical acquisition. We formulate the hypothesis that some of the limitations of the current system are mainly due to a poor representation of the constraints used. F...

متن کامل

In So Many Words: Knowledge as a Lexical Phenomenon

Lexical knowledge is knowledge that can be expressed in words. Circular though this may seem, we think it provides a perfectly reasonable point of departure, for, in line with a long-standing philosophical tradition it posits communicability as the most characteristic aspect of lexical knowledge. Knowledge representation systems should be designed so as to fit lexical data rather than the other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007